AITopics | bayesian principle

Practical Deep Learning with Bayesian Principles

Neural Information Processing SystemsDec-25-2025, 22:01:00 GMT

Bayesian methods promise to fix many shortcomings of deep learning, but they are impractical and rarely match the performance of standard methods, let alone improve them. In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar performance in about the same number of epochs as the Adam optimiser, even on large datasets such as ImageNet. Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on out-of-distribution data are improved, and continual-learning performance is boosted. This work enables practical deep learning while preserving benefits of Bayesian principles.

bayesian principle, name change, practical deep learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Practical Deep Learning with Bayesian Principles

Kazuki Osawa, Siddharth Swaroop, Mohammad Emtiyaz E. Khan, Anirudh Jain, Runa Eschenhagen, Richard E. Turner, Rio Yokota

Neural Information Processing SystemsAug-20-2025, 00:04:58 GMT

Neural Information Processing Systems http://nips.cc/

latexit sha1, learning, vogn, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(5 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

Add feedback

Practical Deep Learning with Bayesian Principles

Neural Information Processing SystemsMay-27-2025, 14:58:26 GMT

Bayesian methods promise to fix many shortcomings of deep learning, but they are impractical and rarely match the performance of standard methods, let alone improve them. In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar performance in about the same number of epochs as the Adam optimiser, even on large datasets such as ImageNet. Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on out-of-distribution data are improved, and continual-learning performance is boosted. This work enables practical deep learning while preserving benefits of Bayesian principles.

bayesian principle, optimiser, practical deep learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reviews: Practical Deep Learning with Bayesian Principles

Neural Information Processing SystemsJan-26-2025, 15:05:23 GMT

Originality: Rather low The main technical novelty lies in applying tricks from the deep learning literature to VOGN. The experiments are fairly standard. Quality: High That being said, the experiments seem to be carefully executed, described in detail and the overall method is technically sound. While not overly ambitious in terms of technical novelty, I think this is a well-executed piece of work. Clarity: High The paper is well-written and easy to follow.

bayesian principle, experiment, practical deep learning, (3 more...)

Neural Information Processing Systems

Industry: Education (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Reviews: Practical Deep Learning with Bayesian Principles

Neural Information Processing SystemsJan-26-2025, 15:05:14 GMT

The paper demonstrates that the Variational Online Gauss-Newton (VOGN) method of Khan et al. (2018) can be successfully scaled to deep learning architectures. The authors demonstrated the scalability of Bayesian methods to large scale data such as ImageNet. Extensive experiments on large scale data and models are provided. The main result is an adoption of an existing model (VOGN) to make it practical for deep learning.

bayesian principle, practical deep learning, scale data, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Practical Deep Learning with Bayesian Principles

Neural Information Processing SystemsOct-10-2024, 19:08:09 GMT

Bayesian methods promise to fix many shortcomings of deep learning, but they are impractical and rarely match the performance of standard methods, let alone improve them. In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar performance in about the same number of epochs as the Adam optimiser, even on large datasets such as ImageNet. Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on out-of-distribution data are improved, and continual-learning performance is boosted. This work enables practical deep learning while preserving benefits of Bayesian principles.

bayesian principle, optimiser, practical deep learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Practical Deep Learning with Bayesian Principles

Osawa, Kazuki, Swaroop, Siddharth, Khan, Mohammad Emtiyaz E., Jain, Anirudh, Eschenhagen, Runa, Turner, Richard E., Yokota, Rio

Neural Information Processing SystemsMar-18-2020, 22:15:50 GMT

Bayesian methods promise to fix many shortcomings of deep learning, but they are impractical and rarely match the performance of standard methods, let alone improve them. In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar performance in about the same number of epochs as the Adam optimiser, even on large datasets such as ImageNet. Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on out-of-distribution data are improved, and continual-learning performance is boosted. This work enables practical deep learning while preserving benefits of Bayesian principles.

artificial intelligence, machine learning, practical deep learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Being Bayesian about Categorical Probability

Joo, Taejong, Chung, Uijung, Seo, Min-Gwan

arXiv.org Machine LearningFeb-18-2020

Neural networks utilize the softmax as a building block in classification tasks, which contains an overconfidence problem and lacks an uncertainty representation ability. As a Bayesian alternative to the softmax, we consider a random variable of a categorical probability over class labels. In this framework, the prior distribution explicitly models the presumed noise inherent in the observed label, which provides consistent gains in generalization performance in multiple challenging tasks. The proposed method inherits advantages of Bayesian approaches that achieve better uncertainty estimation and model calibration. Our method can be implemented as a plug-and-play loss function with negligible computational overhead compared to the softmax with the cross-entropy loss function.

bayesian, categorical probability, softmax, (15 more...)

arXiv.org Machine Learning

2002.07965

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Practical Deep Learning with Bayesian Principles

Osawa, Kazuki, Swaroop, Siddharth, Jain, Anirudh, Eschenhagen, Runa, Turner, Richard E., Yokota, Rio, Khan, Mohammad Emtiyaz

arXiv.org Machine LearningJun-6-2019

Bayesian methods promise to fix many shortcomings of deep learning, but they are impractical and rarely match the performance of standard methods, let alone improve them. In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar performance in about the same number of epochs as the Adam optimiser, even on large datasets such as ImageNet. Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated and uncertainties on out-of-distribution data are improved. This work enables practical deep learning while preserving benefits of Bayesian principles. A PyTorch implementation will be available as a plug-and-play optimiser.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1906.02506

Country:

Asia (0.93)
North America > United States > California (0.46)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre:

Instructional Material > Course Syllabus & Notes (0.48)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Large-Scale Model Selection with Misspecification

Demirkaya, Emre, Feng, Yang, Basu, Pallavi, Lv, Jinchi

arXiv.org Machine LearningMar-16-2018

Model selection is crucial to high-dimensional learning and inference for contemporary big data applications in pinpointing the best set of covariates among a sequence of candidate interpretable models. Most existing work assumes implicitly that the models are correctly specified or have fixed dimensionality. Yet both features of model misspecification and high dimensionality are prevalent in practice. In this paper, we exploit the framework of model selection principles in misspecified models originated in Lv and Liu (2014) and investigate the asymptotic expansion of Bayesian principle of model selection in the setting of high-dimensional misspecified models. With a natural choice of prior probabilities that encourages interpretability and incorporates Kullback-Leibler divergence, we suggest the high-dimensional generalized Bayesian information criterion with prior probability (HGBIC_p) for large-scale model selection with misspecification. Our new information criterion characterizes the impacts of both model misspecification and high dimensionality on model selection. We further establish the consistency of covariance contrast matrix estimation and the model selection consistency of HGBIC_p in ultra-high dimensions under some mild regularity conditions. The advantages of our new method are supported by numerical studies.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1803.07418

Country: